Similarity Search on Bregman Divergence: Towards Non-Metric Indexing

نویسندگان

  • Zhenjie Zhang
  • Beng Chin Ooi
  • Srinivasan Parthasarathy
  • Anthony K. H. Tung
چکیده

In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key properties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space. Subsequently, we show how state-of-the-art tree-based indexing methods, for low to moderate dimensional datasets, and vector approximation file (VA-file) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bounding techniques and distribution-based index optimization are also introduced to improve the performance of query answering and index construction respectively, which can be applied on both the R-trees and VA files. Extensive experiments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hardness and Non-Approximability of Bregman Clustering Problems

We prove the computational hardness of three k-clustering problems using an (almost) arbitrary Bregman divergence as dissimilarity measure: (a) The Bregman k-center problem, where the objective is to find a set of centers that minimizes the maximum dissimilarity of any input point towards its closest center, and (b) the Bregman k-diameter problem, where the objective is to minimize the maximum ...

متن کامل

Efficient Bregman Range Search

We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tas...

متن کامل

Overlapping clustering based on kernel similarity metric

Producing overlapping schemes is a major issue in clustering. Recent proposed overlapping methods relies on the search of an optimal covering and are based on different metrics, such as Euclidean distance and I-Divergence, used to measure closeness between observations. In this paper, we propose the use of another measure for overlapping clustering based on a kernel similarity metric .We also e...

متن کامل

Non-Metric Locality-Sensitive Hashing

Non-metric distances are often more reasonable compared with metric ones in terms of consistency with human perceptions. However, existing locality-sensitive hashing (LSH) algorithms can only support data which are gauged with metrics. In this paper we propose a novel locality-sensitive hashing algorithm targeting such non-metric data. Data in original feature space are embedded into an implici...

متن کامل

The Basic Principles of Metric Indexing

This chapter describes several methods of similarity search, based on metric indexing, in terms of their common, underlying principles. Several approaches to creating lower bounds using the metric axioms are discussed, such as pivoting and compact partitioning with metric ball regions and generalized hyperplanes. Finally, pointers are given for further exploration of the subject, including non-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009